add intra_distance_score evaluation#103
add intra_distance_score evaluation#103omkarakaya wants to merge 13 commits intomaciejkula:masterfrom
Conversation
|
Thank you for this! I think to move this forward, we'll want to do two things.
On making this fast: as far as I understand, we want to compute the cosine distance between the columns i and j of the (sparse) interaction matrix. There are a couple of things that we can do to make this faster than the current implementation:
This way we don't need the cache either.
We can make it even faster by using the fact that indices are sorted and using numba, but this is probably a good first step. |
maciejkula
left a comment
There was a problem hiding this comment.
Thanks for the changes, I made some additional comments.
More generally:
- Can you give me a sense of how slow/fast this is? Could you post how long it takes for the Movielens 100k dataset, for instance?
- We will need a version of this for sequence-based models. Have a look at the MRR routines to see how the two differ.
|
|
||
| distances = [] | ||
| test = test.tocsr() | ||
| mat = train.tocoo().T.tocsr() |
There was a problem hiding this comment.
Is the coo conversion necessary?
| """ | ||
|
|
||
| distances = [] | ||
| test = test.tocsr() |
There was a problem hiding this comment.
What do we need test for? For knowing which users to compute the predictions for?
Maybe a cleaner way of doing the same would be to allow the user to pass in an optional array of user ids for which the metric should be computed.
| for user_id, row in enumerate(test): | ||
| predictions = -model.predict(user_id) | ||
| rec_list = predictions.argsort()[:k] | ||
| distance = [ |
There was a problem hiding this comment.
I personally find nested list comprehensions very confusing. Could we use nested for loops here?
|
Thank you very much for the comments. I agree. In addition to these comments, I'm still looking for a way to move distance function to input parameters. The current function will be the default one since it's really fast (I'll post exact times separately) |
… intra_distance
|
I've fixed the review comments except for the sequence-based models solution. calling I should check how we can achieve to run this on sequence-based models; So we need a matrix like this to calculate the distance; where each row represents an item. Any advise or guidance would be greatly appreciated. |
Issue #90 (Diversification metrics for evaluation)
Intra_distance diversity is probably mostly considered metric of diversity.
Therefore I'd like to add it first.